29 research outputs found

    PENCIL: Towards a Platform-Neutral Compute Intermediate Language for DSLs

    Full text link
    We motivate the design and implementation of a platform-neutral compute intermediate language (PENCIL) for productive and performance-portable accelerator programming

    MLPerf Inference Benchmark

    Full text link
    Machine-learning (ML) hardware and software system demand is burgeoning. Driven by ML applications, the number of different ML inference systems has exploded. Over 100 organizations are building ML inference chips, and the systems that incorporate existing models span at least three orders of magnitude in power consumption and five orders of magnitude in performance; they range from embedded devices to data-center solutions. Fueling the hardware are a dozen or more software frameworks and libraries. The myriad combinations of ML hardware and ML software make assessing ML-system performance in an architecture-neutral, representative, and reproducible manner challenging. There is a clear need for industry-wide standard ML benchmarking and evaluation criteria. MLPerf Inference answers that call. In this paper, we present our benchmarking method for evaluating ML inference systems. Driven by more than 30 organizations as well as more than 200 ML engineers and practitioners, MLPerf prescribes a set of rules and best practices to ensure comparability across systems with wildly differing architectures. The first call for submissions garnered more than 600 reproducible inference-performance measurements from 14 organizations, representing over 30 systems that showcase a wide range of capabilities. The submissions attest to the benchmark's flexibility and adaptability.Comment: ISCA 202

    Pencil A Platform-Neutral Compute Intermediate Language for DSL Compilers

    Get PDF
    International audienceProgramming accelerators such as GPUs with low-level APIs and languages like OpenCL and CUDA is difficult, error prone, and not performance-portable. Automatic parallelization and domain specific languages (DSLs) have been proposed to hide this complexity and to regain some performance portability. In this presentation, I will present PENCIL (Platform-Neutral Compute Intermediate Language) and present some details about how it is compiled. PENCIL is a rigorously defined subset of GNU C99 with specific programming rules and few extensions. Adherence to this subset and the use of these extensions enable compilers to exploit parallelism and to better optimize code when targeting accelerators. We intend PENCIL both as a portable language to facilitate accelerator programming, and as an intermediate language for DSL compilers. We validate the potential of PENCIL on a state-of-the-art polyhedral compiler, extending the applicability of the compiler to dynamic, data-dependent control flow and non-affine array accesses

    Benchmarking TinyML Systems : Challenges and Direction

    Get PDF
    Recent advancements in ultra-low-power machine learning (TinyML) hardware promises to unlock an entirely new class of smart applications. However, continued progress is limited by the lack of a widely accepted benchmark for these systems. Benchmarking allows us to measure and thereby systematically compare, evaluate, and improve the performance of systems and is therefore fundamental to a field reaching maturity. In this position paper, we present the current landscape of TinyML and discuss the challenges and direction towards developing a fair and useful hardware benchmark for TinyML workloads. Furthermore, we present our four benchmarks and discuss our selection methodology. Our viewpoints reflect the collective thoughts of the TinyMLPerf working group that is comprised of over 30 organizations

    architectures]: Multiple Data Stream Architectures

    No full text
    We have developed a bit-reversal algorithm (BRAVO) using vector permute operations, which is optimal in the number of permutations, and its cache-optimal version (CO-BRAVO). Our implementation on PowerMac G5 shows 2– 4.5 fold improvement for small data sets and 15–75 % improvement for large data sets (depending on the data element size) over the best known approach (COBRA)
    corecore